test driven development
Leveraging Test Driven Development with Large Language Models for Reliable and Verifiable Spreadsheet Code Generation: A Research Framework
Large Language Models (LLMs), such as ChatGPT, are increasingly leveraged for generating both traditional software code and spreadsheet logic. Despite their impressive generative capabilities, these models frequently exhibit critical issues such as hallucinations, subtle logical inconsistencies, and syntactic errors, risks particularly acute in high stakes domains like financial modelling and scientific computations, where accuracy and reliability are paramount. This position paper proposes a structured research framework that integrates the proven software engineering practice of Test-Driven Development (TDD) with Large Language Model (LLM) driven generation to enhance the correctness of, reliability of, and user confidence in generated outputs. We hypothesise that a "test first" methodology provides both technical constraints and cognitive scaffolding, guiding LLM outputs towards more accurate, verifiable, and comprehensible solutions. Our framework, applicable across diverse programming contexts, from spreadsheet formula generation to scripting languages such as Python and strongly typed languages like Rust, includes an explicitly outlined experimental design with clearly defined participant groups, evaluation metrics, and illustrative TDD based prompting examples. By emphasising test driven thinking, we aim to improve computational thinking, prompt engineering skills, and user engagement, particularly benefiting spreadsheet users who often lack formal programming training yet face serious consequences from logical errors. We invite collaboration to refine and empirically evaluate this approach, ultimately aiming to establish responsible and reliable LLM integration in both educational and professional development practices.
LLM4TDD: Best Practices for Test Driven Development Using Large Language Models
Piya, Sanyogita, Sullivan, Allison
In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating the program given an outline of the expected behavior. For decades, program synthesis has been an active research field, with recent approaches looking to incorporate Large Language Models to help generate code. This paper explores the concept of LLM4TDD, where we guide Large Language Models to generate code iteratively using a test-driven development methodology. We conduct an empirical evaluation using ChatGPT and coding problems from LeetCode to investigate the impact of different test, prompt and problem attributes on the efficacy of LLM4TDD.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Texas > Tarrant County > Arlington (0.04)
- North America > United States > New York > New York County > New York City (0.04)
The Modern Python 3 Bootcamp
Work through nearly 200 exercises and quizzes! Learn about all of the latest features in Python 3.6 Use Python to create an automated web crawler and scraper Make complex HTTP requests to APIs using Python Master the quirks of Python style and conventions Really Really Understand Object Oriented programming in Python Learn testing and TDD (Test Driven Development) with Python Write your own Decorators and higher order functions Write your own Generators and other Iterators Confidently work with Lambdas! Work through nearly 200 exercises and quizzes! Comment Policy: Please write your comments that match the topic of this page's posts. Comments that contain links will not be displayed until they are approved.
Weights & Biases - ML Best Practices: Test Driven Development at Latent Space
I sat down with the Latent Space team to talk about best practices around collaboration and managing model iteration. In machine learning, bugs may affect the distribution of possible models more than any particular instance, making traditional deterministic tests misleading. Because of this, a test-driven development framework for large ML models must account for the statistical nature of training. This is especially crucial when multiple researchers and engineers are contributing to the same model, as it's easy to silently introduce regressions into a codebase. Here, the team shares some insights about how this new form of test-driven development has been the key to moving quickly on a large-scale collaborative project.